Discarding Noise in an Automatically Acquired Lexicon of Support verb Constructions

نویسنده

Begoña Villada Moirón

چکیده

We applied data-driven methods to carry out automatic acquisition of Dutch prepositional support verb constructions (SVCs) in corpora (e.g., iets in de gaten houden (“keep an eye on something”)). This paper addresses the question whether linguistic diagnostics help to discard noise from thenbest lists and how to (semi-)automatically apply such linguistic diagnostics to parsed corpora. We show that some of the linguistic diagnostics proposed in Hollebrandse (1993) effectively identify SVCs and contribute a modest error rate decrease.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Lexicon-Grammar of Italian Idioms

This paper presents the Lexicon-Grammar classification of Italian idioms that has been constructed on formal principles and, as such, can be exploited in information extraction. Among MWEs, idioms are those fixed constructions which are hard to automatically detect, given their syntactic flexibility and lexical variation. The syntactic properties of idioms have been formally represented and cod...

متن کامل

Deverbal Nouns in Czech Light Verb Constructions

In this paper, we provide a well-founded description of Czech deverbal nouns in both nominal and verbal structures (light verb constructions), based on a complex interaction between the lexicon and the grammar. We show that light verb constructions result from a regular syntactic operation. We introduce two interlinked valency lexicons, NomVallex and VALLEX , demonstrating how to minimize the s...

متن کامل

Automatic translation of support verb constructions

M. Gross (1981) calls such verbs 'support verbs', and I shall adopt his terminologLv. These verbs exhibit many interesting properties which have been studied systematically for several French support verbs: faire (make), avoir (have), prendre (take), etre (be), etc. An examination of the results indicates that support verb,; must be taken into account in the parser and in the lexicon of a progr...

متن کامل

Hindi CCGbank: CCG Treebank from the Hindi Dependency Treebank

In this paper, we present an approach for automatically creating a Combinatory Categorial Grammar (CCG) treebank from a dependency treebank for the Subject-Object-Verb language Hindi. Rather than a direct conversion from dependency trees to CCG trees, we propose a two stage approach: a language independent generic algorithm first extracts a CCG lexicon from the dependency treebank. A determinis...

متن کامل

Semi-automatic Building of Swedish Collocation Lexicon

This work focuses on semi-automatic extraction of verb-noun collocations from a corpus, performed to provide lexical evidence for the manual lexicographical processing of Support Verb Constructions (SVCs) in the Swedish-Czech Combinatorial Valency Lexicon of Predicate Nouns. Efficiency of pure manual extraction procedure is significantly improved by utilization of automatic statistical methods ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Discarding Noise in an Automatically Acquired Lexicon of Support verb Constructions

نویسنده

چکیده

منابع مشابه

The Lexicon-Grammar of Italian Idioms

Deverbal Nouns in Czech Light Verb Constructions

Automatic translation of support verb constructions

Hindi CCGbank: CCG Treebank from the Hindi Dependency Treebank

Semi-automatic Building of Swedish Collocation Lexicon

عنوان ژورنال:

اشتراک گذاری